Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework
نویسنده
چکیده
This paper proposes a novel joint multi-speaker tracking-andseparation method based on the generalized labeled multi-Bernoulli (GLMB) multi-target tracking filter, using sound mixtures recorded by microphones. Standard multi-speaker tracking algorithms usually only track speaker locations, and ambiguity occurs when speakers are spatially close. The proposed multi-feature GLMB tracking filter treats the set of vectors of associated speaker features (location, pitch and sound) as the multi-target multi-feature observation, characterizes transitioning features with corresponding transition models and overall likelihood function, thus jointly tracks and separates each multi-feature speaker, and addresses the spatial ambiguity problem. Numerical evaluation verifies that the proposed method can correctly track locations of multiple speakers and meanwhile separate speech signals.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملMulti-Sensor Control for Multi-Object Bayes Filters
Sensor management in multi-object stochastic systems is a theoretically and computationally challenging problem. This paper presents a novel approach to the multi-target multi-sensor control problem within the partially observed Markov decision process (POMDP) framework. We model the multi-object state as a labeled multi-Bernoulli random finite set (RFS), and use the labeled multi-Bernoulli fil...
متن کاملStatistical Information Fusion for Multiple-View Sensor Data in Multi-Object Tracking
This paper presents a novel statistical information fusion method to integrate multiple-view sensor data in multi-object tracking applications. The proposed method overcomes the drawbacks of the commonly used Generalized Covariance Intersection method, which considers constant weights allocated for sensors. Our method is based on enhancing the Generalized Covariance Intersection with adaptive w...
متن کاملLabeled RFS-Based Track-Before-Detect for Multiple Maneuvering Targets in the Infrared Focal Plane Array
The problem of jointly detecting and tracking multiple targets from the raw observations of an infrared focal plane array is a challenging task, especially for the case with uncertain target dynamics. In this paper a multi-model labeled multi-Bernoulli (MM-LMB) track-before-detect method is proposed within the labeled random finite sets (RFS) framework. The proposed track-before-detect method c...
متن کاملSpeaker Tracking Using an Audio-visual Particle Filter
We present an approach for tracking a lecturer during the course of his speech. We use features from multiple cameras and microphones, and process them in a joint particle filter framework. The filter performs sampled projections of 3D location hypotheses and scores them using features from both audio and video. On the video side, the features are based on foreground segmentation, multi-view fa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1710.10432 شماره
صفحات -
تاریخ انتشار 2017